Search CORE

27 research outputs found

Re-estimation of Lexical Parameters for Treebank PCFGs

Author: Tejaswini Deoskar
Publication venue
Publication date: 01/01/2008
Field of study

We present procedures which pool lexical information estimated from unlabeled data via the Inside-Outside algorithm, with lexical information from a treebank PCFG. The procedures produce substantial improvements (up to 31.6 % error reduction) on the task of determining subcategorization frames of novel verbs, relative to a smoothed Penn Treebank-trained PCFG. Even with relatively small quantities of unlabeled training data, the re-estimated models show promising improvements in labeled bracketing f-scores on Wall Street Journal parsing, and substantial benefit in acquiring the subcategorization preferences of low-frequency verbs.

CiteSeerX

Crossref

Constructive Type-Logical Supertagging with Self-Attention Networks

Author: Deoskar Tejaswini
Kogkalidis Konstantinos
Moortgat Michael
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 24/05/2019
Field of study

We propose a novel application of self-attention networks towards grammar induction. We present an attention-based supertagger for a refined type-logical grammar, trained on constructing types inductively. In addition to achieving a high overall type accuracy, our model is able to learn the syntax of the grammar's type system along with its denotational semantics. This lifts the closed world assumption commonly made by lexicalized grammar supertaggers, greatly enhancing its generalization potential. This is evidenced both by its adequate accuracy over sparse word types and its ability to correctly construct complex types never seen during training, which, to the best of our knowledge, was as of yet unaccomplished.Comment: REPL4NLP 4, ACL 201

arXiv.org e-Print Archive

Utrecht University Repository

Generating image captions with external encyclopedic knowledge

Author: Deoskar Tejaswini
Nikiforova Sofia
Paperno Denis
Winter Yoad
Publication venue
Publication date: 10/10/2022
Field of study

Accurately reporting what objects are depicted in an image is largely a solved problem in automatic caption generation. The next big challenge on the way to truly humanlike captioning is being able to incorporate the context of the image and related real world knowledge. We tackle this challenge by creating an end-to-end caption generation system that makes extensive use of image-specific encyclopedic data. Our approach includes a novel way of using image location to identify relevant open-domain facts in an external knowledge base, with their subsequent integration into the captioning pipeline at both the encoding and decoding stages. Our system is trained and tested on a new dataset with naturally produced knowledge-rich captions, and achieves significant improvements over multiple baselines. We empirically demonstrate that our approach is effective for generating contextualized captions with encyclopedic knowledge that is both factually accurate and relevant to the image

arXiv.org e-Print Archive

Utrecht University Repository

Shift-Reduce CCG Parsing using Neural Network Models

Author: Ambati Bharat Ram
Deoskar Tejaswini
Steedman Mark
Publication venue
Publication date: 01/01/2016
Field of study

Edinburgh Research Explorer

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Using CCG categories to improve Hindi dependency parsing

Author: Ambati Bharat Ram
Deoskar Tejaswini
Steedman Mark
Publication venue
Publication date: 01/08/2013
Field of study

Edinburgh Research Explorer

Generating image captions with external encyclopedic knowledge

Author: Deoskar Tejaswini
Nikiforova Sofia
Paperno Denis
Vinter Seggev Yoad
Publication venue: 'Center for Open Science'
Publication date: 10/10/2022
Field of study

Utrecht University Repository

Simple Semi-Supervised Learning for Prepositional Phrase Attachment

Author: Birch Alexandra
Coppola Gregory F.
Deoskar Tejaswini
Steedman Mark
Publication venue
Publication date: 01/01/2011
Field of study

Edinburgh Research Explorer

An Incremental Algorithm for Transition-based CCG Parsing

Author: Ambati Bharat Ram
Deoskar Tejaswini
Johnson Mark
Steedman Mark
Publication venue
Publication date: 01/01/2015
Field of study

Incremental parsers have potential advantages for applications like language modeling for machine translation and speech recognition. We describe a new algorithm for incremental transition-based Combinatory Categorial Grammar parsing. As English CCGbank derivations are mostly right branching and non-incremental, we design our algorithm based on the dependencies resolved rather than the derivation. We introduce two new actions in the shift-reduce paradigm based on the idea of 'revealing' (Pareschi and Steedman, 1987) the required information during parsing. On the standard CCGbank test data, our algorithm achieved improvements of 0.88% in labeled and 2.0% in unlabeled F-score over a greedy non-incremental shift-reduce parser.11 page(s

CiteSeerX

Crossref

Edinburgh Research Explorer

Macquarie University ResearchOnline

A comparison of latent semantic analysis and correspondence analysis of document-term matrices

Author: Deoskar Tejaswini
Hessen Dave
Qi Qianqian
van der Heijden P.G.M.
Publication venue: 'Center for Open Science'
Publication date: 14/07/2022
Field of study

Latent semantic analysis (LSA) and correspondence analysis (CA) are two techniques that use a singular value decomposition (SVD) for dimensionality reduction. LSA has been extensively used to obtain low-dimensional representations that capture relationships among documents and terms. In this article, we present a theoretical analysis and comparison of the two techniques in the context of document-term matrices. We show that CA has some attractive properties as compared to LSA, for instance that effects of margins arising from differing document-lengths and term-frequencies are effectively eliminated, so that the CA solution is optimally suited to focus on relationships among documents and terms. A unifying framework is proposed that includes both CA and LSA as special cases. We empirically compare CA to various LSA based methods on text categorization in English and authorship attribution on historical Dutch texts, and find that CA performs significantly better. We also apply CA to a long-standing question regarding the authorship of the Dutch national anthem Wilhelmus and provide further support that it can be attributed to the author Datheen, amongst several contenders

arXiv.org e-Print Archive

Utrecht University Repository

Hindi CCGbank: CCG Treebank from the Hindi Dependency Treebank

Author: A Bharati
A Joshi
A Mahajan
B Kumari
Bharat Ram Ambati
C Shastri
D Hays
J Hockenmaier
J Nivre
J Robinson
M Kuhlmann
M Lewis
M Palmer
M Steedman
Mark Steedman
MP Marcus
N Xue
S Clark
S Reddy
S Uematsu
T Mohanan
Tejaswini Deoskar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Crossref

Springer - Publisher Connector

Edinburgh Research Explorer